Tidy Data

Presenter: Olivia Beck
Content Credit: Matthew Beckman, Hadley Wickham

May 17, 2023

Tidy Data

“Happy families are all alike; every unhappy family is unhappy in its own way.” –– Leo Tolstoy

“Tidy datasets are all alike, but every messy dataset is messy in its own way.” –– Hadley Wickham

Vocabulary

Variable

Cases

What is Tidy Data

There are three interrelated rules which make a dataset tidy:

  1. Each variable must have its own column.
  2. Each observation/case must have its own row.
  3. Each value must have its own cell.

It is your job as the researcher to define the variables, observations, and values.

Example of Untidy data

Example of Tidy Data

Galton Data

In the 1880s, Francis Galton started to make a mathematical theory of evolution.

Here’s part of a page from his lab notebook. Discuss the following in groups:

A page from Francis Galton’s notebook.

Activity 01: Tidy Data

Work to put these tables in tidy form

Table 1: Galton’s Height measurements data

A page from Francis Galton’s notebook.

Table 2: Presidents

Code Books

What is a code book?

References